EN FR
EN FR


Section: New Results

Middleware for Grid and Cloud computing

RDF Data Storage and Retrieval In P2P Systems

Participants : I. Filali, F. Huet, F. Baude, F. Bongiovanni, L. Pellegrino, B. Sauvan, I. Alshabani, A. Bourdin.

We have proposed in the context of the SOA4ALL FP7-IP project ( 8.3.1.1 ) the design and the implementation of a hierarchical Semantic Space infrastructure based on Structured Overlay Networks (SONS) [46] ,[10] . It aims at the storage and the retrieval of the semantic description of services at the Web scale [47] . This infrastructure combines the strengths of both P2P paradigm at the architectural level and the Resource Description Framework (RDF) data model at the knowledge representation level. As it is designed, the proposed infrastructure enables the processing of simple and complex queries. This year, the following achievements have been realised.

  • A thorough survey of the existing works that have adapted the combination of RDF data model and the P2P communication model to build distributed infrastructures for RDF data storage and retrieval has been performed. This effort was published in a journal [34] . A previous but more complete version of this work can be found in a research report  [45] and was used extensively in [39] , [38] , [36] .

  • We provided and presented in [23] an implementation of a three dimensional CAN overlay network for storing and retrieving RDF triples. At the implementation level, a modular and flexible architecture for the Semantic Space infrastructure has been proposed. The implementation relies on the ProActive Grid middleware and provides a clear separation between its sub-components (overlay, storage, query engine, etc.). The modularity of the architecture is combined with the decentralized aspect of the infrastructure enabling the RDF data storage and retrieval at large scale. The evaluation of the infrastructure through micro-benchmarks experiments on clusters and grids shows the impact of the architecture and data distribution on the performance of the storage and processing mechanisms.

In the context of the FP7 Strep PLAY ( 8.3.1.2 ) and French ANR SocEDA ( 8.2.2 ) research projects, we have extended the aforementioned work with a content-based Publish/Subscribe abstraction in order to support asynchronous queries for RDF-based events in large scale settings. In order to support these queries efficiently, we worked on an efficient broadcast primitive on top of CAN which we formalized and implemented (see section  6.3.2 ). We are also working towards a generalization of this broadcast algorithm to a multicast one, and contribute intensively to the general integration effort for offering such innovative semantic described event marketplace platform at cloud scale [41] .

An algorithm for efficient broadcast over CAN-like P2P networks

Participants : L. Henrio, F. Bongiovani, A. Craciun.

The nature of some large-scale applications such as content delivery systems or publish/subscribe systems, built on top of structured overlay networks, demands application-level dissemination primitives which do not overwhelm the overlay and which are also reliable. Building such communication primitives in a reliable manner on top of such networks would increase the confidence regarding their behavior prior to deploying them in real settings. In order to come up with real efficient primitives, we take advantage of the underlying geometric topology of the overlay network and we also model the way peers communicate with one another. For this our objective is to design and prove an efficient and reliable broadcast algorithm for CAN-like P2P networks. To this aim, this year we:

  • Formalised in Isabelle/HOL a CAN-like P2P system, devised formalised tools to reason on CAN topologies, and on communication protocols on top of CANs. We proved first completeness and correctness properties on some classes of broadcast algorithms.

  • Designed an efficient broadcast algorithm on top of CAN and implemented it.

Preliminary results were presented at CFSE and published as a research report [37] ; another publication is being written.under way.

Matlab/Scilab parallel programming

Participant : F. Viale.

Matlab & Scilab, with millions of users around the world, are industry standards for numerical computing. They both lack a powerful and modern parallel computing framework to meet the industry's growing demand in terms of parallel processing. This activity is intended to integrate into both softwares a toolbox for parallel processing, based on ProActive.

  • This year, we implemented a ProActive Scilab toolbox with the same functionalities as the ProActive Matlab toolbox we built last year.

  • We added in the Matlab toolbox a disconnected mode to allow closing the Matlab session while uncomplete Matlab jobs are still running on the scheduler side, and retrieving the job results at the next connection.

  • We reorganized both Matlab and Scilab toolboxes with a cleaner and more intuitive API, a stronger and stabler implementation and an extensive documentation. We designed as well unit-tests to make the toolboxes fully usable for production standards.

  • The Scilab toolbox is now deployed on PACAGrid cluster and used extensively by other partners of the OMD2 ANR.

Network Aware Cloud Computing

Participants : S. Malik, F. Huet.

We have proposed a cloud scheduler module named Network Awareness Module (NAM), which helps the scheduler to take the more efficient scheduling decisions on the basis of resource features,such as network latency, reliability, environment compatibility and monetary cost issues.

  • Currently we are working on Reliability Assessment based Scheduling on Cloud Infrastructure. We are building a model, which enables a scheduler to schedule the tasks on cloud infrastructure, on the basis of adaptive reliability of nodes (virtual machine). The core of this model is a reliability assessment algorithm, which computes the reliability for individual resources and for the group of resources.

  • We have proposed, designed and implemented an algorithm for the grouping of nodes on the basis of inter-node latencies. This algorithm can do the dynamic grouping and work with the incomplete latency information available. It groups the nodes on the basis of node latency instead of neighbor count. It produces mutually exclusive groups and can perform group reconfiguration.

  • We have designed a model of the Virtual Cloud [27] . The concept of Virtual cloud revolves around the concept, “Rent Out the Rented Resources”. In this model, cloud vendors offer low cost cloud services by acquiring underutilized resources from some big third-party enterprise. The cloud vendor then further rents out those resources/services to the cloud users. The upfront and administrative costs for the Virtual cloud vendor are lower, and the cloud users access services at a cheaper rate.

  • We have proposed a fault tolerance model for real time cloud computing [27] . In this model, the system tolerates the faults and makes the decision on the basis of reliability of the virtual machines. The reliability of the virtual machines is adaptive, which changes after every computing cycle. A metric model is given for the reliability assessment. The system provides both the forward and backward recovery mechanisms.